Quickly Finding Orthologs as Reciprocal Best Hits with BLAT, LAST, and UBLAST: How Much Do We Miss?
نویسندگان
چکیده
Reciprocal Best Hits (RBH) are a common proxy for orthology in comparative genomics. Essentially, a RBH is found when the proteins encoded by two genes, each in a different genome, find each other as the best scoring match in the other genome. NCBI's BLAST is the software most usually used for the sequence comparisons necessary to finding RBHs. Since sequence comparison can be time consuming, we decided to compare the number and quality of RBHs detected using algorithms that run in a fraction of the time as BLAST. We tested BLAT, LAST and UBLAST. All three programs ran in a hundredth to a 25th of the time required to run BLAST. A reduction in the number of homologs and RBHs found by the faster algorithms compared to BLAST becomes apparent as the genomes compared become more dissimilar, with BLAT, a program optimized for quickly finding very similar sequences, missing both the most homologs and the most RBHs. Though LAST produced the closest number of homologs and RBH to those produced with BLAST, UBLAST was very close, with either program producing between 0.6 and 0.8 of the RBHs as BLAST between dissimilar genomes, while in more similar genomes the differences were barely apparent. UBLAST ran faster than LAST, making it the best option among the programs tested.
منابع مشابه
Testing the Speed of Usearch and Blat in Comparison to Blast and Determining Their Sensitivity for Detecting Orthologs as Reciprocal Best Hits
متن کامل
Choosing BLAST options for better detection of orthologs as reciprocal best hits
MOTIVATION The analyses of the increasing number of genome sequences requires shortcuts for the detection of orthologs, such as Reciprocal Best Hits (RBH), where orthologs are assumed if two genes each in a different genome find each other as the best hit in the other genome. Two BLAST options seem to affect alignment scores the most, and thus the choice of a best hit: the filtering of low info...
متن کاملReciprocal best hits are not a logically sufficient condition for orthology
It is common to use reciprocal best hits, also known as a boomerang criterion, for determining orthology between sequences. The best hits may be found by blast, or by other more recently developed algorithms. Previous work seems to have assumed that reciprocal best hits is a sufficient but not necessary condition for orthology. In this article, I explain why reciprocal best hits cannot logicall...
متن کاملOrthologs from maxmer sequence context
Context-dependent identification of orthologs customarily relies on conserved gene order or whole-genome sequence alignment. It is shown here that short-range context—as short as single maximal matches—also provides an effective means to identify orthologs within whole genomes. On pristine (un-repeatmasked) mammalian whole-genome assemblies we perform a genome “intersection” that in general con...
متن کاملDetecting putative orthologs
We developed an algorithm that improves upon the common procedure of taking reciprocal best blast hits(rbh) in the identification of orthologs. The method-reciprocal smallest distance algorithm (rsd)-relies on global sequence alignment and maximum likelihood estimation of evolutionary distances to detect orthologs between two genomes. rsd finds many putative orthologs missed by rbh because it i...
متن کامل